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Preface 


The  effects  of  clutter  on  search  (S)  and  target  acquisition  (TA)  are 
of  current  interest  to  the  U.S.  Army  Night  Vision  as  well  as  the 
North  Atlantic  Treaty  Organization  (NATO)  working  group 
investigating  camouflage,  concealment,  and  deception  evaluation 
techniques.  Wendell  Watkins  of  the  U.S.  Army  Research 
Laboratory  (ARL)  and  Mathee  Valeton  of  the  Human  Factors 
Research  Institute  (TNO)  of  The  Netherlands  are  members  of  this 
NATO  working  group.  Discussions  between  them  in 
1998  resulted  in  an  invitation  by  TNO  for  WendeU  Watkins  to 
perform  a  joint  experiment  at  TNO  under  the  ARL  Professional 
Exchange  Program.  The  experiment  performed  was  designed  to 
evaluate  the  benefits  of  using  wide  baseline  stereo  vision  over 
single  line  of  sight  (mono)  vision.  One  of  the  important  goals  was 
to  show  how  stereo  vision  could  be  used  to  mitigate  the  effects  of 
clutter  on  S  and  TA,  and  results  indicate  that  stereo  vision  can  be 
effectively  used  to  reduce  false  alarm  detection  rates. 
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Executive  Summary 


As  the  sensitivity  and  resolution  of  imaging  systems  have 
jjjipx'oved/  the  targets  for  which  they  were  intended  to  detect 
continue  to  become  less  conspicuous.  This  issue  was  taken  up  by 
the  North  Atlantic  Treaty  Organization  in  the  Camouflage, 
Concealment,  and  Deception  Evaluation  Techmcjues  working 
group.  This  group's  research  results  showed  that  addressing 
concerns  such  as  camouflage  design  requires  a  better 
imderstanding  of  search  (S)  and  target  acquisition  (TA). 

One  possible  means  of  improving  S  and  TA  and  clutter  rejection 
may  be  through  the  use  of  stereoscopic  vision.  However,  the 
ability  of  stereoscopic  vision  to  make  targets  "pop  out"  of  scenes 
has  not  yet  been  exploited  because  of  the  cost  of  dual-detection 
systems.  Fortunately,  the  cost  of  these  systems  is  dropping,  and  at 
the  same  time,  the  advantages  may  offset  additional  costs. 

An  S  and  TA  test  was  conducted  that  provided  single  (mono)-  and 
wide  baseline  stereo  imagery  for  observer  testing.  The  database 
developed  for  observer  testing  contained  the  same  scene  with  and 
without  camouflaged,  hiunan  targets  present.  The  analysis  results 
of  imagery  from  the  second  of  two  sites  have  provided  valuable 
findings.  Analysis  of  variance  did  not  show  significant 
differences  between  single  line  of  sight  and  stereo  vision  in 
general;  however,  there  were  differences  in  the  observer 
responses: 

•  There  was  a  significant  difference  in  the  false  target  detections 
(between  mono  and  stereo  vision)  for  the  narrow  field  of  view 
(FOV)  cases. 

•  Analysis  of  target  range  effect  showed  that  there  was  better 
performance  for  the  small,  narrow  baseline  FOV  case  with 
longer  ranges. 

•  Also,  there  were  several  targets  that  could  be  identified  as 
distribution  outliers  and  rationale  for  the  poorer  performance 
of  the  stereo  vision  related  to  biased  displays  that  favored  the 
mono  vision. 

•  There  was  only  a  little  difference  in  total  number  of  correctly 
detected  targets.  Nonetheless,  this  research  suggests  that 
about  a  20  percent  increase  in  correctly  identified  targets  using 
stereo  vision  may  be  possible  to  obtain  if  proper  training  in  the 
use  of  stereo  vision  is  given  prior  to  testing  and  optimized 
displays  are  used. 


IX 


It  appears  that  as  the  FOV  was  decreased  from  medium  to 
small,  mono  vision  results  showed  an  increase  in  false  alarms, 
possibly  from  the  effects  of  global  clutter. 

About  one-half  of  the  observers  showed  a  substantial  decrease 
in  false  alarms,  indicating  again  that  with  proper  training  in 
tlie  use  of  stereo  vision  and  optimized  displays,  the  number  of 
false  alarms  may  be  decreased  by  a  factor  of  2  using  stereo 
vision. 

Finally,  data  were  obtained  that  can  be  used  to  optimize  the 
display  for  observer  performance  using  stereo  vision. 


1.  Introduction 


Recently  there  have  been  some  spectacular  applications  of 
stereoscopic  (stereo)  vision.  For  example,  stereo  vision  was  used 
on  the  Mars  Lander  for  navigating  on  the  planet  surface.  It  was 
also  used  to  perform  the  complex  underwater  exploration  of  the 
Titanic.  Most  animals  have  developed  stereo  vision  through 
evolution.  Although  a  significant  portion  of  the  human  brain  is 
devoted  to  deriving  motion  and  depth  cues  through  complex 
processing  of  the  imagery  from  both  of  our  eyes,  stereo  vision  is 
not  being  widely  utilized.  One  of  the  reasons  for  this  is  that 
poorly  displayed  stereo  images  or  video  can  produce  severe 
eyestrain.  At  the  same  time,  some  positive  comparisons  showing 
the  benefits  of  using  stereo  over  mono  vision  have  been  made. 
[1,2]  Efforts  to  better  imderstand  and  model  the  complex  brain 
process  used  to  derive  the  3-dimensional  (3-D)  content  of  the 
scenes  that  our  two  eyes  use  for  stereo  have  become  quite 
sophisticated  and  continue  to  evolve  (including  the  effects 
introduced  by  the  display  system).  [3,4,5]  Despite  these  efforts, 
the  use  of  stereo  vision  still  has  not  been  fully  exploited.  Practical 
Handbook  on  Image  Processing  for  Scientific  Applications  was 
published  recently  in  which  only  a  dozen  pages  out  of  almost 
600  are  devoted  to  stereo  vision  applications.  [6]  The  recent 
improvements  in  heads-up  and  head-moimted  displays  may  open 
the  door  for  more  widespread  use  of  stereo  vision  especially  since 
the  utility  of  sophisticated  3-D  scene  modeling  is  enhanced  by  the 
use  of  stereoscopic  displays.  There  is  one  area  that  has  not  been 
addressed,  and  that  is  the  use  of  wide  baseline  stereo  vision  for 
search  (S)  and  target  acqxtisition  (TA).  [7] 

The  rationale  for  performing  the  research  presented  in  this  paper 
is  derived  from  the  test  results  from  the  Distributed  Interactive 
Systems  Search  «&  Target  Acquisition  Fidelity  (DISSTAF) 
conducted  at  Fort  Hunter-Liggett,  CA  in  1995.  The  visible  data 
sets  collected  by  the  Dutch  are  currently  being  used  to  evaluate 
the  camouflage,  concealment,  and  deception  performance  models 
for  the  NATO  SCI-12  Working  Group.  [8]  A  group  from  the  U.S. 
Army  Research  Laboratory  (ARL)  collected  wide  baseline  stereo 
imagery  at  the  DISSTAF  Test.  The  results  of  showing  this  stereo 
imagery  to  some  of  the  observers  used  for  the  DISSTAF  Test  was 
that  there  are  depth  cues  that  can  be  used  at  multiple  kilometer 
ranges  for  S  and  TA  tasks.  These  results,  coupled  with  the 
application  of  stereo  vision  for  detecting  camouflage,  need  to  be 
quantified  for  comparison  with  the  biocular  single  line  of  sight 
(LOS)  (mono  vision)  S  and  TA  methodology.  [9]  The  problem  is 
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that  there  are  currently  no  good  models  for  handling  clutter  in 
imagery,  even  for  single  LOS  imagery  analysis,  especially  when 
the  targets  are  camouflaged.  This  deficiency  was  recently 
highlighted  by  James  Hatches  of  the  U.S.  Night  Vision  at  the  SPBE 
AeroSense  Symposium  in  an  Invited  Overview  paper  of  Night 
Vision's  efforts  past,  present,  and  future.  [10]  Notably,  clutter 
quantification  was  on  the  top  of  the  list  for  future  research. 


2.  Field  Experiments 


An  S  and  TA  test  was  performed  under  an  exchange  scientist 
program  with  the  Netherlands  Organization  for  Applied  Scientific 
Research  (TNO),  Human  Factors  Research  Institute  at  Soesterberg, 
The  Netherlands,  in  September  1998.  The  test  was  performed  at  a 
military  training  base,  using  four  scientists  from  TNO  wearing 
Dutch  forest  camouflage  uniforms  as  participants.  Sets  of  wide 
baseline  stereo  photos  were  obtained  for  targeted  and 
nontargeted/ targeted  scenes  at  two  sites.  The  targeted  and 
nontargeted  scene  photographic  slides  were  taken  on  the  same 
day  within  a  few  minutes  of  each  other.  The  imagery  obtained 
was  taken  with  35-mm  cameras  with  200-mm  lenses  for  target 
ranges  from  100  to  900  m.  A  single  field  of  view  (FOV)  was  used 
for  aU  of  the  targeted  and  nontargeted  scenes  at  each  site.  The 
photos  were  taken  with  color  slide  film  and  were  digitized  to  3  x 
2  K  pixel  resolution.  The  imagery  data  sets  were  used  to  perform 
S  and  TA  tests. 

2.1  Rationale  for  Target  and  Site  Selection 

There  is  no  standard  method  for  comparing  mono  versus  stereo 
vision  for  various  S  and  TA  tasks.  Hence,  the  targets  were 
positioned  with  the  objective  of  quantifying  the  impact  of  scene 
clutter  on  S  and  TA  for  both  mono  and  stereo  LOSs.  The  simplest 
targets  to  use  were  humans  with  camouflaged  attire  to  sufficiently 
match  the  surroundings  so  that  the  targets  were  not  obvious  and 
sufficient  clutter  was  present  to  assess  target  placement  in 
different  clutter  regions.  The  assessment  imagery  database  also 
had  to  have  several  LOSs  for  stereo  vision  for  comparison  with 
mono  vision  performance  for  the  same  task.  The  human 
interocular  separation  for  maximum  unaided  depth  perception 
ranges  is  about  10  mrad.  Multiples  of  this  separation  was  utilized 
for  assessing  the  performance  of  stereo  versus  mono  vision  for  the 
same  S  and  TA  task. 

With  a  35-mm  camera,  a  camouflaged  human  can  only  be  detected 
in  digitized  photographic  film  slides  to  a  range  of  about  300  m. 
Therefore,  200-mm  lenses  were  used  that  yielded  an  FOV  of  15  x 
10.  Each  camera's  LOS  was  positioned  with  a  conspicuous 
feature  in  the  center  of  the  FOV.  There  were  24  total  target 
locations  identified  for  each  of  the  two  sites  that  represented  easy 
to  difficult  targets  for  detection.  These  locations  were  referenced 
to  several  prominent  scene  features  that  were  ranged  with  a 
binocular  range  finder. 
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2.2  Site  1  and  Site  2  Measurements 

Sufficient  35-mm  cameras  and  200-mm  lenses  were  obtained  to  set 
up  four  stereo  cameras.  The  targets  used  were  humans  wearing 
Dutch  forest  camouflage  uniforms  shown  in  appendix  figure 
A-1.  The  test  was  conducted  over  a  2-day  period  at  the 
Soesterberg  Artillery  Facility  where  two  sites  were  used. 

1.  Site  1  had  shorter  ranges  (110  to  675  m)  with  sunny  and 
partly  cloudy  conditions. 

2.  Site  2  had  longer  ranges  (400  to  900  m)  with 
cloudy/ rainy  conditions. 

Site  1  had  four  camera  positions  with  6-m  separation,  and  the 
second  had  three  camera  positions  with  10-m  separation  (as 
shown  in  appendbc  figure  A-2).  Targets  were  arrayed  in  each  of 
six  different  target  locations.  Slide  photos  of  designated  target 
positions,  targeted  scenes,  and  nontargeted  scenes  were  taken  at 
each  of  two  test  sites.  The  result  was  an  imagery  database  with 
24  targets  for  four  stereo  LOSs  for  site  1  and  three  stereo  LOSs  for 
site  2.  Because  photos  were  taken  with  and  without  the  targets 
present,  it  is  possible  to  analyze  the  impact  of  target  placement 
and  background  clutter  levels. 

The  targets  were  positioned  in  six  different  locations  with  overall 
target  ranges  from  approximately  110  to  660  m  for  site  1  and 
400  to  900  m  for  site  2.  The  cameras  were  placed  on  tripod 
moimts  in  a  straight  line  (perpendicular  to  the  LOS)  to  the  middle 
of  the  target-scene  FOV,  about  1.5  m  above  the  ground.  When  the 
four  targets  were  in  their  first  position,  the  LOS  from  each  of  the 
stereo  cameras  to  each  target  had  to  be  checked  to  ensure  that  the 
LOS  was  not  blocked.  Then,  the  targets  held  up  large  white  cards 
to  designate  their  position,  and  one  photographic  slide  was  taken 
as  quickly  as  possible  from  each  of  the  stereo  cameras.  The  targets 
were  then  instructed  to  turn  around  and  hide  their  card  and  take 
either  standing  or  crouching  positions.  By  facing  away,  the 
targets  did  not  expose  face  or  hand  features  that  are  strong 
detection  cues  for  visible  images.  Two  slide  photos  were  taken  of 
these  targeted  scenes  from  each  of  the  stereo  cameras.  Then  the 
targets  were  instructed  to  hide,  and  two  slide  photos  were  taken 
of  these  nontargeted  scenes.  The  24-target  positions  were 
obtained  by  repeating  this  process  six  times.  The  target  scene  for 
site  1  without  targets  is  shown  in  appendix  figiue  A-3.  A 
composite  target  scene  for  site  1  with  all  24  targets  with  their 
white  signs  is  shown  in  appendix  figure  A-4.  The  corresponding 
scenes  for  site  2  are  shown  in  appendix  figures  A-5  and  A-6.  Note 
that  only  23  out  of  24  targets  could  be  foimd  in  site  2. 
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3.  Laboratory  Tests 


A  cursory  examination  of  the  two  testing  sites  (shown  in  appendix 
figures  A-3  through  A-6)  indicated  that  it  was  much  easier  to 
locate  the  targets  for  site  1.  Therefore,  the  initial  data  analysis 
necessary  for  producing  an  imagery  presentation  for  observer 
testing  was  performed  on  the  site  1  images. 

3.1  Site  1  Presentation  Preparation 

Appendix  figure  A-4  shows  the  location  of  the  24  target  positions 
from  the  far  right  camera  (of  the  four  cameras  with  basehne 
separations  of  6  m  between  each  one).  The  targets  are  confined  to 
just  over  one-half  of  the  vertical  extent  of  the  whole  3072  x 
2048  pixel-digitized  image.  The  initial  approach  was  to  place  a 
rectangular  grid  over  the  picture  to  isolate  the  targets  in  separate, 
rectangular  sectors  so  that  the  targets  were  not  divided  into 
multiple  sectors.  This  result  was  obtained  with  only  minor  target 
clipping  by  using  a  rectangular  array  of  7  sectors  wide  x  4  sectors 
high  with  each  sector  being  396  pixels  wide  x  264  pixels  high. 
Adobe®  Photoshop®  software  was  used  to  splice  together  some  of 
these  sectors  from  the  large  images  containing  the  targets  because 
not  all  of  the  targets  in  the  resulting  multiple-targeted  sectors 
were  present  at  the  same  time.  The  array  was  labeled  as  shown  in 
appendix  table  A-1  with  each  sector  representing  a  1.9°  x  1.3°  FOV. 
There  were  9  sectors  with  no  targets,  15  sectors  with  1  target, 
3  sectors  with  2  targets,  and  1  sector  with  3  targets.  This  set  of 
28  small  FOVs  represented  the  target  scenes  whether  or  not  a 
target  was  present.  The  same  grid  was  used  on  the  large  digitized 
image  with  no  targets  present  to  produce  a  set  of  28  small, 
nontargeted  FOVs.  Because  the  observer  task  intended  to 
investigate  clutter  in  the  form  of  false  targets,  the  sector  scenes 
contained  an  unrestricted  number  of  targets.  The  observers'  task 
was  to  determine  if  there  were  none,  one,  or  more  than  one  target 
in  each  scene. 
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3.2  Site  1  Image  Display 

Computer  monitor  displays  were  the  only  means  available  to 
observers.  Adobe®  Photoshop®  was  used  to  produce  sets  of  targeted 
and  nontargeted  sector  .bmp  files  of  792  x  528  pixels  or  1.2  Mbytes 
for  the  RGB-color  image  from  the  original  396  x  264  pixel  images. 
There  were  56  total  images  for  the  right  LOS.  In  order  to  obtain 
the  correct  stereo  image  for  the  other  LOSs,  the  center  terrain 
feature  of  the  right  LOS  was  found  in  the  other  LOS  whole-scene 
images  and  a  396  x  264  rectangular-image  sector  was  cut  out 
around  this  center  feature.  As  the  angular  separation  increased, 
there  were  a  few  sectors  that  could  not  be  matched.  A  random 
ordering  of  the  targeted  and  nontargeted  sectors  was  performed 
such  that  the  ranges'  targeted  and  nontargeted  images  were 
randomly  mixed  with  the  constraint  that  the  same  sector  targeted 
and  nontargeted  scenes  were  separated  by  several  intervening 
different  sector  images.  Finally,  because  of  the  limited  number  of 
sectors,  targeted  sector  A4,  which  had  an  easily  detected  target, 
was  shown  first  as  a  learning  image.  Microsoft®  PowerPoint®  was 
used  to  produce  four  separate  slide  shows  of  128  scenes.  The 
targeted  and  nontargeted  scenes  were  each  separated  by  a 
numbered  scene  witli  a  black  background.  The  first  scene  in  the 
slide  show  was  one  of  the  numbered  scenes  with  black 
backgrounds. 


3,3  Site  1  Results 

Because  the  results  of  site  1  testing  impacted  how  site  2  test  was 
designed,  it  is  necessary  to  synopsize  the  results  of  site  1  observer. 
Detailed  results  are  given  in  "Depth  Perception  Applied  to  Search 
and  Target  Acquisition."  [11]  When  the  observers  were  shown  the 
slide  presentation,  the  location  of  the  real  and  false  target 
detections  were  recorded  as  weU  as  the  search  time  for  each  sector 
presented.  In  further  discussion,  the  observers  were  shown  only 
the  right  LOS  images  on  a  single-monitor  display. 

In  general,  the  search  times  for  the  nontarget  sectors  are  longer 
than  for  the  target  sectors.  In  fact,  there  were  only  two  cases 
where  the  target  sectors  had  longer  times  than  the  overall  average 
search  time  of  6.25  s.  Longer  times  in  these  sectors  are  logical  for 
they  are  the  most  difficult  sectors  in  which  to  find  targets  (see  the 
table  (next  page)  for  the  average  search  time).  [11] 
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Table.  Average  search  time  for  nontargeted /targeted  sectors 

Average  search  time 


Nontargeted 

sectors 

Targeted  sectors 

Low  difficulty 

- 

4.0  s 

Medium  difficulty 

6.7  s 

7.5  s 

High  difficulty 

7.5  s 

- 

The  range  of  average  search  times  for  individual  observers  was 
from  1.75  to  13.88  s.  There  was  a  correlation  between  poor  overall 
scores  and  longer  search  times.  To  better  compare  the  results  of 
the  different  observers  with  respect  to  the  differences  between 
times  taken  to  search  individual  sectors,  the  search  times  of  each 
observer  were  divided  by  that  observer's  average  search  time  to 
obtain  normalized  search  times.  When  this  was  done,  there  were 
852  sectors  where  no  target  or  a  false  target  was  detected  taking 
an  average  normalized  time  of  1.15.  There  were  828  sectors  where 
targets  or  false  targets  were  found  taking  an  average  normalized 
time  of  0.81.  In  general,  it  also  took  longer  to  determine  that  there 
was  no  target  or  a  false  target  present  than  when  there  was.  When 
the  false  targets  present  were  very  target-like,  as  in  the  case  for 
most  of  the  medium  difficulty  nontargeted  sectors,  the  detection 
time  was  short  and  the  nondetection  time  was  long.  In  sectors 
where  most  of  the  observers  found  no  targets,  the  search  times 
increased,  and  when  detection  was  made,  it  was  a  false  target  (this 
is  similar  to  over  training  a  neural  net). 

This  research  provided  a  few  examples  of  how  moderate  to 
difficult  targets  are  missed  in  scenes  when  there  is  an  easier  target 
or  false  target  detected  first.  Specifically,  sector  B4  had  three 
targets  present  with  positive  identification  (ID)  difficulties  of  low, 
medium,  and  hi^  located,  in  the  left  center,  right  center,  and 
center  of  the  sector,  respectively.  This  image  provided  a  good 
example  of  how  the  human-detection  process  works:  When  a  S 
and  TA  task  is  given,  a  fuzzy  notion  is  formulated  of  what  the 
target  of  interest  is.  The  scenes  are  searched  for  the  fuzzy  target. 
If  a  detection  of  a  real  or  false  target  is  made,  the  target  construct 
becomes  well-defined  and  the  scene  search  is  rapidly  completed 
thereafter  even  if  multiple  targets  are  present  and  detected.  This 
refinement  in  the  target  sought  can  cause  targets  to  be  missed.  In 
this  particular  sector,  there  is  a  fairly  easy  standing  target  to  detect 
right  in  the  middle.  The  crouching  target  to  the  left  and  away 
from  the  tree  line  was  detected  only  when  it  was  seen  first;  only 
2  of  the  30  observers  accomplished  this.  Both  of  these  observers 
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were  able  to  then  detect  the  easy  standing  target  in  the  center,  but 
did  not  detect  the  medium  difficulty  standing  target  in  the  right 
center.  A  similar  occurrence  happened  in  sector  Al,  where  there 
was  a  bush  that  very  much  resembled  a  standing  target  in  the 
center  of  the  sector.  This  made  the  detection  of  the  crouching  real 
target  in  the  bottom  center  more  difficult. 

Finally,  an  initial  attempt  at  presenting  the  stereo  slide  shows  to 
observers  revealed  some  distinct  problems.  The  observers  found 
that  the  images  in  the  closest  sectors  could  not  be  fused  for  the 
FOV  of  the  entire  sector — there  simply  was  too  much  parallax. 
At  110  m,  the  approximate  1.9-m  high  human  targets  represent 
about  90  percent  of  the  sector  image  height  (238  pixels).  At  650  m, 
the  human  targets  represent  only  about  15  percent  of  the  sector 
image  height  (40  pixels).  With  a  6-m  platform  separation  between 
the  right  and  right  center  cameras,  the  resulting  shift  between  the 
bottom  and  top  elements  of  the  scenes  in  the  D  sector  is  1.8  m 
(225  pixels)  with  the  standing  target  experiencing  90  percent  of 
this  shift  from  bottom  to  top.  In  the  C  sector,  the  parallax  shift 
bottom  to  top  is  5.0  m  (180  pixels).  This  time  a  1.9  m  target  in  the 
bottom  of  the  scene  would  represent  only  45  percent  of  the  height 
with  only  about  81 -pixel  parallax  shift  from  the  bottom  to  top  of 
the  target  Stereo  fusion  at  this  range  was  possible  but  not 
comfortable.  Finally,  in  the  B  sector  the  parallax  shift  bottom  to 
top  is  6.3  m  or  145  pixels.  Now,  the  1.9-m  target  in  the  bottom  of 
the  scene  represents  just  25  percent  of  the  height  with  only  about 
36-pixel  parallax  shift  from  ^e  bottom  to  top  of  the  target.  These 
images  could  be  fused  easily  and  showed  good  depth  perception. 
Hence,  to  be  able  to  compare  the  results  of  mono  to  stereo  vision 
for  the  near  targets  would  require  a  display  of  an  FOV  about  one- 
third  the  one  that  was  used  for  the  closest  sectors. 

3.4  Site  2  Test  Design 

The  main  purpose  of  the  observer  test  was  to  address  the  question 
of  which  viewing  condition  (mono  or  stereo)  gives  the  best  S  and 
TA  results.  Several  lessons  were  learned  from  the  observer  test  on 
site  1  data  even  though  the  Microsoft®  PowerPoint®  presentations 
produced  in  stereo  could  not  be  used.  Using  the  stereo  imagery 
displays  from  site  1  with  the  6-m  baseline  and  the  FOV  chosen, 
only  the  images  with  ranges  of  300  m  or  more  could  be  readily 
fused.  Hence,  for  the  same  FOV  images  for  site  2  with  10-m 
baseline,  the  ranges  500  m  and  larger  should  be  easily  fused.  But 
the  impact  of  FOV  on  S  and  TA  was  not  known;  therefore,  the 
three  following  FOVs  were  used:  (1)  One  very  close  to  the  one 
used  for  site  1  observer  test  (384  x  240  pixels  instead  of  396  x  264 
pixels),  (2)  one  50  percent  larger,  and  (3)  one  50  percent  smaller. 
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This  matched  the  standard  Adobe®  Photoshop®  gridline  block  sizes 
of  24  X  24  pixels  as  summarized  in  appendix  table  A-2.  Also,  the 
impact  of  baseline  separation  was  not  known.  Hence,  for  the 
smaller  FOVs  both  10  m  and  20  m  baseline  separations  were  used. 

The  imagery  from  the  second  day's  testing  at  site  2  was  collected 
with  three  different  cameras.  Of  the  three  camera  positions,  the 
photos  from  the  left  camera  had  the  best  image  quality.  The 
center  and  right  camera  photos  were  a  little  blurrier,  and  aU  three 
had  slightly  different  color  composition  even  though  all  the 
cameras  were  set  to  the  same  exposure  and  aperture  settings.  The 
200-m  lenses  must  have  had  optics  with  different  color 
transmission.  These  differences  did  not  cause  as  much  of  a 
problem  as  with  site  1  image  processing  of  the  stereo  image  pairs 
with  Adobe®  Photoshop®  because  the  overcast  light  rain  conditions 
tended  to  mute  the  color  differences  somewhat.  To  begin,  the  left 
LOS  was  used  as  the  reference.  A  composite  picture  of  all  of  the 
target  locations  (see  appendix  figure  A-6)  was  produced  by 
splicing  the  target  photos  with  white  location  cards  displayed 
onto  the  photo  with  the  first  four  target  positions. 

Observer  testing  was  approached  as  was  the  testing  performed  for 
site  1  imagery  database.  Instead  of  reducing  the  number  of 
available  scenes  by  placing  only  one  target  in  each  scene,  both 
single  and  multiple  targeted  scenes  were  included  in  the  test. 
Even  so,  the  terrain  in  the  imagery  scene  (as  shown  in  appendix 
figures  A-5  and  A-6)  only  allowed  a  limited  number  of  targeted 
scenes  of  medium  and  large  size  to  be  extracted  for  the  observer 
tests.  There  were  two  targets  (one  in  the  bottom  center  at  236  m 
and  one  to  the  left  of  the  bunker  along  the  bottom  center  road  at 
335  m)  that  were  too  close  for  the  entire  image  to  be  easily  fused  in 
stereo  with  the  large  or  medium  FOV  images.  Nevertheless,  the 
target  next  to  the  bunker  was  included  in  a  large  FOV  scene  and 
both  were  included  in  medium  FOV  scenes. 

The  medium  FOV-scene  size  was  chosen  as  384  x  240  pixels  to 
closely  match  the  396  x  264  pixel  sectors  used  in  the  observer  test 
for  site  1.  The  size  was  chosen  for  ease  in  processing  the  different 
FOVs  using  Adobe®  Photoshop®,  since  the  standard  overlay  gridline 
blocks  are  24  x  24  pixels.  Therefore,  the  medium  FOV  is  16  blocks 
wide  X  10  blocks  high.  This  selection  made  it  easy  to  get  the 
50  percent  larger  and  smaller  FOVs.  The  large  FOV  is  24  x 
15  blocks  and  the  smaU  FOV  is  8  x  5  blocks. 

By  selecting  various  positions  for  the  different  FOV  templates  in 
the  overall  scene,  a  distribution  with  different  numbers  of  targets 
was  obtained.  There  were  between  one  and  three  targets  in  the 
large  FOV,  one  and  five  in  the  mediiun  FOV,  and  one  or  two  in 
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the  small  FOV.  Because  there  were  only  a  limited  number  of  large 
and  medium  FOVs,  they  were  combined  into  one  Microsoft® 
PowerPoint®  presentation,  and  the  small  FOV  images  were  put  into 
a  second  presentation.  There  were  only  23  total  targets  because 
one  could  not  be  found  (only  three  white  cards  were  visible  in  one 
of  the  six  target  locating  scenes).  There  were  10  large  FOV  scenes 
selected  with  22  of  the  23  targets  included.  There  were 
12  medium  FOV  scenes  selected  with  all  23  targets  present.  These 
sets  were  divided  into  two  groupings  of  5  large  and  6  medium 
scenes  that  contained  22  or  23  nonduplicated  target  locations.  To 
these  2  setS/  11  scenes  were  added  that  represented  the 
nontargeted  scenes  that  were  used  in  the  other  group.  Hence, 
each  group  contained  the  same  total  of  22  scenes  but  only  one-half 
of  them  had  targets.  From  site  1  imagery,  five  scenes  were 
selected  for  which  good  stereo  pairs  could  be  produced.  These 
were  added  to  the  beginning  of  both  sets  as  training  scenes.  The 
small  FOV  presentation  contained  20  targeted  scenes  and  the 
same  20  nontargeted  scenes  but  with  no  target/ nontarget  pair  in 
dose  proximity. 

Next  the  issue  of  wide  and  narrow  baseline  separation  was 
addressed.  The  large  and  medium  FOV  presentation  used  only 
the  10-m  baseline  stereo.  The  small  FOV  scenes  were  divided  into 
2  groupings  of  10  with  11  or  12  targets  in  each  group.  One  group 
was  displayed  with  10-m  baseline  stereo  and  the  other  with  20-m 
baseline  stereo.  The  targeted  scenes  shown  within  these  two 
groups  with  one  baseline  had  their  corresponding  nontargeted 
scenes  shown  with  the  other  baseline.  Then  two  separate 
presentations  were  made  up  of  both  the  large/medium  and  the 
small  FOVs  with  targeted  and  nontargeted  scenes  reversed.  A 
second  random  ordering  of  these  sets  were  made  and  two  more 
flip-flopped  target/nontarget  presentations  were  again  made. 
Hence,  there  were  a  total  of  four  each  of  the  two  types  of  FOV 
presentations.  Examples  of  the  three  different  FOVs  and  the  level 
of  difficulty  of  target  detection  are  shown  in  appendix  figures 
A-7  through  A-10. 

The  level  of  difficulty  of  the  targets  was  quite  good  compared  to 
site  1  scenes.  One  of  the  problems  as  mentioned  above  with 
site  1  targets  was  that  they  were  too  easy  to  pick  out.  In  fact, 
80  percent  of  the  targets  were  correctly  identified  by  90  percent  of 
the  obseivers  as  shown  in  appendix  table  A-3.  Nonetheless,  five 
of  the  longer  range  A  and  B  sector  scenes  from  site  1  were 
effectively  used  to  train  the  observers  on  both  the  single  LOS  and 
stereo  vision  S  and  TA  task  (as  shown  in  appendix  figures 
A-11  through  A-15). 


3.5  Site  2  Image  Display 

In  order  to  present  the  images  to  the  observers,  two  separate 
computers  were  used  with  their  monitor  displays  side  by  side. 
The  person  running  the  test  could  use  cross-eyed  stereo  viewing 
of  the  monitor  displays  to  interpret  the  results  given  by  the 
observer.  The  monitor  displays  were  converted  to  video  by  two 
TView  Gold®  signal  converters  and  displayed  by  a  modified  pair 
of  Virtual  lO  stereo  goggles  that  allowed  the  left  and  right 
displays  to  be  driven  by  diFferent  video  signal  inputs.  For  mono 
vision,  the  same  left  Microsoft®  PowerPoint®  presentation  was 
displayed  by  both  computers.  For  stereo  vision,  the  left  view 
Microsoft:®  PowerPoint  presentation  was  fed  into  the  left  goggle 
display  and  the  other  Microsoft®  PowerPoint®  presentation  (center 
and  right  LOS)  was  fed  into  the  right  goggle  display.  The  target 
scenes  were  separated  by  a  black-numbered  scene  that  allowed 
the  viewer  to  retain  a  dark-adapted  state  during  the  course  of  the 
experiment.  The  room  lights  were  kept  low  to  provide  a 
noninterfering  light  level  for  optimum  use  of  the  stereo  goggles. 
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4.  Observer  Database 


Because  the  focus  of  this  paper  is  to  compare  stereo  vision  and 
mono  vision  for  S  and  TA  tasks,  only  the  database  from  the  site 
2  test  will  be  considered  here.  There  were  a  total  of  36  observers 
that  took  two  stereo  vision  tests  and  two  mono  vision  tests.  One 
test  included  22  or  23  targets  positioned  in  large  and  medium 
FOV  scenes.  The  other  test  included  23  targets  positioned  in  small 
FOV  scenes  with  the  stereo  portion  displayed  with  either  10-m  or 
20-m  camera  baseline  separation. 


4.1  S  and  TA  Task 


Some  of  the  most  useful  S  and  TA  information  can  be  obtained 
using  eye  tracking  of  the  observer.  Unfortunately,  this  type  of 
analysis  tool  was  not  available.  Hence,  an  S  and  TA  task  was 
given  with  an  associated  rating  system  to  obtain  the  desired  type 
of  response.  The  observers  were  split  into  two  groups  with  one 
group  performing  the  mono  test  first  and  then  the  stereo  2  weeks 
later.  The  second  group  took  the  stereo  test  first  and  the  mono 
test  2  weeks  later.  The  general  task  was  given  as  follows  for  both 
tests  with  the  stereo  portion  only  given  when  that  test  was  taken. 

•  General  task  instructions — Your  task  is  to  find  all  of  the  forest- 
camouflaged  personnel  targets  standing  or  squatting  in  the 
scenes  as  quickly  as  possible.  There  may  be  one,  none,  or 
more  than  one  target  in  each  scene.  Once  aU  targets  have  been 
located  you  are  to  say  "stop."  Then  tell  how  many  targets 
were  found  and  their  location  T-L  (top  left),  T-C,  T-R,  C-L,  C, 
C-R,  B-L,  B-C,  or  B-R;  yoiu  search  will  be  timed.  For  the 
purpose  of  detection  accuracy,  2  points  will  be  added  for  every 
target  correctly  identified,  3  points  will  be  subtracted  for  every 
missed  target,  and  1  point  will  be  subtracted  for  every  false 
target  identified  (i.e.,  SCORE  =  2  X  [Positive  ID]  -  3  X  [Missed 
Targets]  -1  X  [False  Alarms]).  The  targets  are  not  trying  to 
hide  and  expose  only  a  small  portion  of  their  bodies,  but  the 
difficulty  in  identif)dng  them  will  range  from  obvious  to  very 
difficult.  The  testing  conditions  were  overcast  with  light  rain. 
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•  Stereo  test — For  the  stereo  testing  portion  of  the  test  the 
observers  were  shown  an  example  of  what  scene  fusion  meant 
as  illustrated  by  appendix  figure  A-16.  The  observers  were 
then  trained  on  the  use  of  this  fusion  technique  to  isolate 
different  range  portions  of  the  training  scenes  from  site  1.  For 
both  portions  of  the  test  (mono  and  stereo)  the  observers  were 
told  that  the  concept  was  to  determine  which  of  the  two 
techniques  worked  better  for  the  S  and  TA  task  and  that  both 
should  be  approached  with  the  same  criteria.  They  were  told 
that  they  would  lose  more  points  for  missing  a  target 
(3  points)  than  they  would  get  if  they  correctly  identified  a 
target  (2  points)  and  that  they  would  lose  only  1  point  for  a 
false  target.  The  rationale  was  to  get  them  to  make  educated 
guesses  to  assess  the  impact  of  clutter  on  the  S  and  TA  process. 
Additionally,  after  both  tests  were  taken  and  before  they  were 
told  how  well  they  performed  on  either  test,  they  were  asked 
to  distribute  5  points  between  the  2  approaches.  The  more 
points  assigned  meant  the  better  they  liked  that  particular 
mode  of  scene  presentation. 

4.2  Target  Identifications 

The  observers  were  screened  by  participating  in  a  stereoscopic 
visual  acuity  test  to  determine  if  they  could  see  in  stereo  and  how 
well.  [12]  A  testing  schedule  was  set  up  to  ensure  they  could  take 
both  tests  2  weeks  apart.  The  observers  were  presented  the 
PowerPoint®  slide  show  after  they  had  become  dark-adapted  to  the 
room  lighting.  They  were  shown  the  five  training  slides  first. 
Then,  they  were  shown  a  black-backgroxmd  slide  with  a  number 
on  it;  this  slide  was  easily  stereo  fused.  They  were  then  timed  as 
they  searched  the  test  scene  for  targets.  When  they  said,  "stop"  the 
watch  was  stopped  and  the  time  recorded.  They  then  told  the 
number  and  location  of  the  targets  found.  The  person  running  the 
test  was  viewing  the  same  scene  on  the  computer  monitors  and 
would  determine  whether  there  was  a  possible  ambiguity  in 
correctly  identifying  real  targets.  If  there  was  any  question,  the 
scene  was  revisited  and  the  computer  arrow  of  the  observer's 
dominant  eye  was  used  to  point  out  the  exact  location  where  the 
target  in  question  was  located  to  determine  if  a  positive  ID  was 
made. 
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5.  Results 


An  analysis  of  the  variance  (ANOVA)  was  performed  on  the 
observation  data  collected  at  the  U.S.  Military  Academy  (USMA), 
West  Point.  Copies  of  the  score  sheets  were  used  to  build  a 
database  in  the  USMA's  Minitab  13  statistical  software  with 
4608  data  entries  or  7  variables  for  each  observer.  These  included 
mono  or  stereo  vision;  large,  medivun,  or  small  FOV;  and  narrow 
or  wide  stereo  baseUne.  There  were  also  several  nuisance 
variables  that  included  gender,  test  order,  and  visual  acuity.  The 
detailed  analysis  performed  is  located  in  "A  Comparison  of 
Observer  Task  Performance:  Three  Dimensional  Versus  Two 
Dimensional  Displays."  [13]  Therefore,  this  report  will  provide  an 
overview  of  the  findings  rather  than  the  complete,  detailed 
analysis. 

First,  some  of  the  nuisance  factors  showed  statistical  significance: 
males  did  better  than  females.  Visual  acuity  was  also  significant 
with  the  30  and  60  mrad  visual-acuity  observers  performing 
better.  Also,  the  order  results  were  different,  but  not  statistically 
significant.  It  appears  that  the  observers  learned  how  to  better 
discriminate  the  false  targets  when  they  saw  the  stereo  first  and, 
hence,  did  better  on  the  mono  portion  of  the  test  than  those 
observers  that  took  the  mono  test  first.  The  analysis  of  the  mono 
versus  stereo  vision  showed  a  difference,  but  it  was  not 
statistically  significant.  The  analysis  of  the  FOVs  showed 
statistical  significance  and  the  results  were  better  with  the  small 
FOV.  Finally,  the  analysis  of  the  baselines  showed  that  the 
10-m  baseline  outperformed  the  20-m  baseline  and  the  mono 
vision,  but  did  not  reach  statistical  significance.  Based  on  these 
results,  two  mam  issues  require  further  investigation: 

•  How  are  the  stereo  vision  results  different  than  the  mono 
vision? 

•  And  what  are  the  requirements  for  optimizing  the  stereo 
vision  display? 
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5.1  Database  Suitability 

The  analysis  will  begin  with  the  suitability  of  the  target-detection 
difficulty  of  site  2  scenes.  This  is  summarized  in  the  appendix 
tables  A-4  and  A-5.  The  difficulty  level  is  divided  into  three  equal 
categories  for  each  FOV.  The  top  third  is  listed  as  easy  (E),  the 
middle  as  (M),  and  the  bottom  as  hard  (H).  The  cutoff  point  is 
shown  in  decreasing  difficulty  to  the  left  of  the  correct  target- 
number  values  and  in  ascending  difficulty  to  the  right  of  the 
values.  There  are  definite  differences  between  different  FOVs  but 
not  between  mono  and  stereo  vision.  In  fact,  there  was  almost  no 
difference  in  the  total  number  of  correctly  detected  targets — 
709  for  mono  vision  and  712  for  stereo  vision.  The  most  uniform 
distribution  from  easy  to  difficult  occurred  for  the  small  FOV  as 
opposed  to  the  poor  distribution  for  site  1  test,  where  90  percent  of 
the  observers  correctly  identified  80  percent  of  the  targets.  This 
was  because  of  the  good  target  contrast  and  color  discrimination 
during  clear  to  partly  cloudy  conditions  as  opposed  to  the  noisy, 
low-contrast  scenes  with  muted  color  when  it  was  overcast  with 
light  ram  at  site  2. 

5.2  Observer  Preference 

The  observer  monos  versus  stereo  preference  results  were 
collected  before  the  observers  knew  how  well  they  performed  the 
S  and  TA  task.  The  point  score  for  mono  vision  was  1.92  ±  0.84 
and  for  stereo  vision  3.08  ±  0.84.  The  stereo  vision  was  preferred 
with  a  variance  of  over  the  individual  populations'  standard 
deviation,  but  the  one  standard  deviation  populations  did 
overlap.  For  the  female  observer  population,  the  results  were 
different.  The  mono  vision  was  preferred  with  a  point  score  of 
2.63  ±  0.92  over  the  stereo  vision  with  2.38  ±  0.92.  On  the  other 
hand,  the  male-observer  population  preferred  the  stereo  vision 
with  a  score  of  3.29  ±  0.71  to  the  mono  vision  with  1.71  ±  0.71.  In 
fact,  the  one  standard  deviation  populations  for  the  male  observer 
scores  did  not  overlap.  Why  then  didn't  the  analysis  of  variances 
show  any  significant  differences  between  the  two  techniques?  As 
previously  mentioned,  the  mono  vision  had  been  given  a  distinct 
advantage  with  the  best  LOS  imagery  and  no  dependence  on  the 
stereo  baseline  variation.  Also,  the  analysis  of  variance  is  not 
designed  to  address  the  issue  of  multiple-target  detection  tasks 
that  can  better  show  the  impact  of  false-target  clutter.  Essentially, 
the  issue  is  to  how  to  handle  a  scene  that  has  a  valid  target  and  a 
very  good  false  target  present.  The  observer  who  is  forced  to  pick 
only  one  may  have  picked  both  as  valid  targets,  if  given  the 
option.  Therefore,  a  target  may  be  missed  in  this  case  due  to 
limited  target  option.  Also,  if  a  scene  has  no  targets  and  there  are 
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two  or  more  false  targets  that  would  have  been  chosen  as  valid 
targets,  the  results  only  show  one  false  alarm  and  not  two  or 
more.  It  is  necessary  to  view  the  observer  results  in  terms  of  the 
task  score  based  on  die  criteria  that  was  given  to  the  observers. 


5.3  Observer  Score 

The  observer  score  was  derived  from  three  factors  as  defined  in 
the  observer's  general  task: 

1.  the  number  of  correctly  identified  targets, 

2.  the  number  of  missed  targets,  and 

3.  the  munber  of  false  target  identifications. 

Because  the  number  of  targets  detected  was  so  close  between  the 
mono  and  stereo  techniques,  only  the  number  of  correctly 
identified  targets  was  used.  The  second  factor  of  false-target 
detection,  or  false  alarms  (FA)s,  will  also  be  addressed.  A  clutter 
rejection  ratio  (CR)  is  used  that  relates  the  number  of  FAs  to  the 
number  of  correctly  ID  targets  by  taking  the  ratio  of  the  target  IDs 
and  dividing  it  by  the  total  number  of  detections  that  includes  the 
target  IDs  plus  the  FAs.  This  was  because  the  targets  were  not  of 
equal  detection  difficulty  as  shown  in  the  appendix  tables  A-4  and 
A-5.  Hence,  an  observer  that  has  a  given  number  of  false  alarms  is 
more  efficient  at  clutter  rejection  when  more  targets  have  been 
correctly  identified  compared  to  that  number  of  false-target 
detections.  The  results  of  the  overall  total  values  and  the  narrow 
baseline,  smaU-FOV  values  are  shown  in  the  appendix  table  A-6. 

The  resulting  difference  between  the  testing  orders  was  that  the 
scores  of  the  group  who  were  shown  mono  vision  first  were  the 
worst.  The  low  CR  value  translates  into  more  FAs  for  this  group. 
Both  fewer  correctly  identified  targets  and  lower  CR  for  the 
narrow-baseline,  small-FOV  case  are  seen  in  the  mono  vision  case. 
The  results  for  all  of  the  FOVs  are  shown  in  the  appendix  table 
A-4.  What  is  of  interest  here  is  the  increase  in  the  CR  of  the  mono 
vision  between  medium  and  small  FOV,  indicating  that  global 
clutter  is  beginning  to  cause  a  problem  in  the  target-detection 
process. 
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5.4  Effects  of  Range  and  FOV 

The  effect  of  target  range  on  detection  is  a  variable  that  was  not 
considered  in  the  previous  analysis  of  variance.  The  difference  of 
correct  target  detections  using  stereo  versus  mono  vision  was 
investigated.  The  correct  target  detections  using  stereo  minus  the 
correct  target  detections  using  mono  vision  for  each  target 
location  are  shown  in  the  appendix  figures  A-17  through 
A-20.  Each  target  was  shown  to  18  observers.  To  begin  looking  at 
the  individual  targets  that  were  detected,  the  target  designation 
scheme  needs  to  be  given.  Six  target  secjuences  were  performed 
with  four  personnel  in  each.  Therefore,  the  target  positions  were 
labeled  as  1  through  6  and  A  through  D.  Looking  at  the  appendix 
figures  A-17  through  A-20,  there  are  some  distinct  outliers  in  the 
distributions  at  medium  to  long  range  that  need  to  be 
investigated.  These  are  the  low  points  (more  mono  detections 
than  stereo):  target  6B  at  850  m  with  a  difference  of  -7  in  appendix 
figure  A-17;  target  3B  and  3A  both  at  670  m  with  a  difference  of 
-10  and  -6,  respectively,  in  appendix  figure  A-18;  target  3B  at 
670  m  with  a  difference  of  -9  in  appendix  figure  A-19;  and  targets 
4C,  5D,  and  6B  at  895  m,  850  m,  and  850  m,  respectively,  with 
differences  of  -10,  -6,  and  respectively,  in  appendbc 
figure  A-20. 

•  Large  FOV  target  6B— Appendix  figure  A-21  shows  the  left 
and  center  LOS  views  for  target  6B  in  the  large-FOV 
display.  This  standing  target  has  significantly  more 
contrast  in  the  left  view  than  the  center  view  on  the  right 
of  appendix  figure  A-21.  The  top  of  this  scene  that  has 
significant  parallax  is  difficult  to  stereo  fuse  without 
several  sessions  of  stereo  fusion  training.  It  appears  that 
none  of  the  observers  successfully  fused  this  target  because 
there  were  no  correct  target  detections;  whereas,  there 
were  seven  observers  that  keyed  on  the  high  contrast 
outline  present  in  the  mono  (left)  view  of  the  target.  The 
high  contrast  is  not  present  in  the  center  view  (right)  that 
was  part  of  the  stereo  display  pair.  Without  this  outlier, 
the  large  FOV  detections  would  have  had  only  4  targets 
with  better  mono  detection  than  stereo  and  12  targets  with 
better  stereo  detection  than  mono  and  14  more  stereo 
detections  than  mono.  This  would  have  represented  16 
percent  better  target  detection  with  stereo. 

•  Medium  FOV  targets  3B  and  3A— Appendix  figure 
A-22  shows  the  left  and  center  LOS  views  for  these  targets 
in  the  medium  FOV  display.  In  this  case,  the  left  view  is 
much  clearer  than  the  center  shown  on  the  right  of 
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appendix  figure  A-22.  Also,  the  left  view  that  was  used  as 
the  mono  display  had  a  head  feature  associated  with  the 
dark  blob  in  the  line  of  bushes  at  the  bottom  right  where 
target  3B  is  located.  There  is  another  similar  target 
3A  with  a  -6  detection  difference  in  the  bottom  left  that 
was  also  easier  to  pick  out  in  the  left  image  than  the  center. 
Without  these  outliers,  the  medium  FOV  detections  would 
have  had  8+  and  8-  value  for  the  detection  differences 
between  mono  and  stereo  vision.  The  stereo  would  have 
had  10  more  detections,  or  6  percent  better  target 
detection. 

•  Small  FOV,  narrow  baseline  target  3B — ^Appendix  figure 
A-23  shows  the  left  and  center  LOS  views  for  this  target  in 
the  small,  narrow  baseline  FOV  display.  As  before,  target 
3B  has  no  head  feature  and  almost  no  contrast  in  the  center 
(on  right)  view  of  the  target  shown  in  appendix  figure 
A-23.  This  target  was  of  moderate  difficulty  with  mono 
vision.  Without  this  outlier  the  small,  narrow  baseline 
FOV  detections  would  have  had  only  five  targets  with 
better  mono  detection  than  stereo  and  13  with  better  stereo 
detection  than  mono  and  26  more  stereo  detections  than 
mono.  This  would  have  represented  12  percent  better 
target  detection  with  stereo. 

•  Small  FOV  with  wide  baseline  targets  4C,  6B,  and  5D — 
Appendix  figiue  A-24  shows  the  left  and  right  views  of 
target  4C  in  the  small,  wide  baseline  FOV  display.  This 
was  of  moderate  difficulty  to  pick  out  with  mono  vision 
from  the  left  view  where  Ihe  standing  target  silhouette  can 
be  seen  in  the  center  to  top  center  in  front  of  a  pine  tree. 
The  right  view  shows  very  little  if  any  silhouette  for  this 
target.  Appendix  figure  A-25  shows  the  left  and  right 
views  of  target  6B  in  die  small,  wide  baseline  FOV  display. 
The  target  is  located  in  the  top  right  of  the  two  views  but 
has  better  contrast  in  the  left  view  used  for  the  mono 
vision.  The  target  was  easy  to  detect  using  mono  vision  as 
90  percent  of  the  observers  correctly  identified  it;  whereas, 
only  about  one-half  of  the  observers  foxmd  it  using  stereo 
vision. 

Two  different  factors  might  explain  the  difference  in  detection 

difficulty; 

1.  Of  the  eight  observers  who  missed  the  target,  one- 
half  identified  the  large  white  rock  in  the  bottom 
right  as  a  target. 

2.  Six  of  the  eight  were  right-eye  dominant. 
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With  right-eye  dominance  the  observers  get  more  cues  from  the 
right  stereo  image  that  had  less  contrast  ipor  the  real  target  6B  and 
a  high-contrast,  false-target  object.  Next,  appendix  figure 
A-26  shows  the  left  and  right  views  of  the  target  5D  in  the  small, 
wide  baseline  FOV  display.  The  faint  target  is  located  in  the 
center  left  to  top  left  of  the  left  view  used  for  the  mono  vision. 
The  right  view  used  in  the  stereo  vision  pair  did  not  have  a 
distinct  view  of  the  target  at  all  because  it  merged  with  a  tree 
feature  located  the  center  and  top  left.  There  were  no  stereo  target 
detections  for  this  target.  Without  these  outliers  the  wide,  narrow 
baseline  FOV  detections  would  have  had  six  targets  with  better 
mono  detection  than  stereo,  seven  targets  with  better  stereo 
detection  than  mono,  and  11  more  stereo  detections  than  mono. 
This  would  have  represented  6  percent  better  target  detection 
with  stereo. 

As  a  final  note,  the  small,  wide  baseline  FOV  target  6D  outlier  at 
840  m  with  a  value  of  +8  will  be  addressed.  It  is  shown  in  the  top 
right  of  both  views  in  appendix  figure  A-27.  This  bush-like  object 
was  hard  to  detect  with  mono  vision  using  the  left  view  of 
appendix  figure  A-27.  When  the  right  and  left  views  were  used  as 
a  stereo  pair  display  the  bush-Uke  feature  stood  out  better  and  had 
a  faint  head  associated  with  it.  Thus,  the  volume  cue  stereo 
detection  for  this  target  was  only  of  medium  difficulty  instead  of 
hard. 

The  net  result  of  this  analysis  is  that  with  only  minor  changes  to 
improve  the  display  of  the  stereo  images  presented  there  could 
easily  have  been  10  percent  more  detections  using  stereo  vision 
than  mono  vision.  With  an  optimized  display,  the  difference 
would  likely  be  at  least  20  percent  more  detections  using  stereo 
vision  than  mono  vision.  As  illustrated  by  appendix  figure 
A-19,  these  increases  in  detection  will  occur  at  longer  ranges. 

Another  issue  is  the  question  of  whether  there  was  a  difference 
between  the  ordering  of  the  test  presentation  (mono  vision  first  vs. 
stereo  vision  first  and  mono  vision  second  vs.  stereo  vision 
second).  Plots  of  the  number  of  targets  detected  versus  observer 
score  are  useful  in  seeing  the  difference.  Appendix  figures 
A-28  through  A-31  show  these  plots  for  the  four  cases. 
Immediately  apparent  is  that  the  mono  vision  first  results  have  a 
clustering  of  observer  results  that  have  observers'  scores  between 
-80  and  -100  instead  of  the  other  test  orderings  whose  scores 
cluster  in  the  -60  to  -80  ranges.  With  comparable  target-detection 
numbers  for  aU  four  test  orderings,  which  means  that  there  had  to 
be  more  false  target  detections  using  mono  first  than  with  any  of 
the  others.  The  CR  numbers  shown  in  appendix  tables  A-6  and 
A-7  reflects  this.  For  the  mono  first  results,  there  were  twice  as 
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many  false  target  identifications  as  there  were  correct  target 
identifications.  The  number  of  false  target  identifications  is 
reduced  for  the  mono  vision  when  the  stereo  vision  is  seen  first, 
especially  for  the  large  and  medium  FOVs  as  seen  in  appendix 
table  A-7.  The  clutter  rejection  is  not  as  good  for  the  small  FOVs, 
but  more  targets  are  detected. 

The  effects  of  target  range  can  also  be  expressed  in  terms  of  the 
correct  target-ID  rate  (i.e.,  the  number  of  correct  target  IDs 
divided  by  the  total  number  of  targets),  which  are  shown  for  the 
different  FOVs  in  appendix  table  A-2.  The  difference  of  ID  rate 
using  stereo  versus  mono  vision  will  be  investigated.  The  mono 
vision  and  stereo  vision  correct  target-ID  rates  for  large,  medium, 
small  with  narrow  baseline,  and  small  with  wide  baseline  FOV 
images  are  shown  in  appendix  figures  A-32  through  A-35.  By 
looking  at  the  individual  target  differences  between  the  mono  and 
stereo  vision  correct  target-ID  rates,  it  is  possible  to  isolate  several 
distinct  outliers  where  the  mono  vision  did  significantly  better  as 
previously  discussed.  But  in  each  case,  there  was  a  distinct 
difference  in  quality  between  the  left  image  used  for  the  mono 
vision  test  and  the  center  or  right  image  used  as  the  right-eye 
input  for  the  stereo  image  test.  These  outliers  are 

•  850-m  range  target  in  the  large  FOV  (shown  in  appendix 
figure  A-32); 

•  two  670-m  range  targets  in  the  medium  FOV  (shown  in 
appendix  figure  A-33); 

•  a  670-m  range  target  in  the  small  with  narrow  baseline 
FOV  (shown  in  appendix  figure  A-34);  and 

•  a  236-m,  two  850-m,  and  an  895-m  range  targets  in  the 
small  with  wide  baseline  FOV  (shown  in  appendix  figure 
A-35). 

The  magnitude  of  the  outliers  can  be  seen  better  by  displaying  the 
difference  (mono  vision  minus  stereo  vision)  in  the  correct  target- 
ID  rates  for  both  the  small  FOV  cases  (shown  in  appendix  figure 
A-36).  The  outliers  are  the  five  values  that  are  below  -0.2  and  can 
each  be  related  to  a  display  problem  that  gave  the  mono  vision 
test  a  distinct  advantage.  Without  these  outliers,  the  stereo  vision 
can  be  seen  to  provide  several  examples  of  improved  target 
detection  especially  at  the  longer  ranges.  The  net  result  of  this 
analysis  is  that  with  only  minor  changes  to  improve  the  display  of 
the  stereo  images  presented.  There  could  easily  have  been  10 
percent  more  correct  target  IDs  using  stereo  vision  than  mono 
vision.  With  an  optimized  display,  the  difference  would  likely  be 
at  least  20  percent  more  correct  target  IDs  using  stereo  vision  than 
mono  vision.  As  illustrated  by  appendix  figure  A-36,  these 
increases  m  detection  wUl  occur  at  longer  ranges. 
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5.5  Clutter  Rejection 

The  next  logical  step  was  an  investigation  into  the  characteristics 
of  the  clutter-rejection  values  versus  the  scores.  Plots  of  observer 
score  versus  the  clutter-rejection  efficiency  for  the  four 
observation  orderings  are  given  in  appendix  figures  A-37  through 
A-40.  In  each  case,  there  is  a  lower-limit  line  with  positive  slope 
that  reaches  the  one-third  value  for  clutter-rejection  efficiency  at  a 
score  of  about  -70.  What  this  means  is  that  at  -70  with  a  one-third 
CR  value  the  observer  would  obtain  no  points  for  target 
detections,  because  the  correct  target  detection  score  (+2  for  every 
correct  identification)  would  be  cancelled  by  the  false  target 
detection  score  (-1  for  every  incorrect  identification).  Therefore, 
the  score  is  based  on  the  missed  target  score  (-3  for  every  missed 
target)  or  about  23  missed  targets. 

There  are  45  or  46  targets  shown  in  the  different  tests 
administered  to  the  four  orderings  of  the  observer  tests;  thus,  the 
-70  score  with  one-third  efficiency  represents  about  the  50  percent 
target  detection  position  for  the  overall  test.  There  is  some 
upward  migration  of  the  observer-score  distribution  along  the 
lower-bound  line  for  mono  vision  first  near  the  middle.  The 
upward  migration  of  the  score  distribution  away  from  tire  lower- 
bound  fine  for  the  mono  second  is  definitely  concentrated  in  the 
better  score  region  to  the  right  of  the  50  percent  detection  position 
with  the  overall  distribution  moving  to  the  right.  The  upward 
migration  of  the  score  distribution  away  from  the  lower-boimd 
fine  for  the  stereo  first  is  also  concentrated  in  the  better  score 
region  to  the  right  of  the  50  percent  detection  position  with  the 
overall  distribution  showing  marked  movement  to  the  right. 

Finally,  the  upward  migration  of  the  score  distribution  away  from 
the  lower-bound  line  for  the  stereo  second  occurs  everywhere 
except  in  the  lower-score  region  with  a  thoroughly  marked 
movement  of  the  distribution  to  the  right.  The  comparison  of 
these  four  plots  shows  that  there  is  definitely  an  overall 
improvement  shown  in  target-clutter  rejection  efficiency  in  the 
better  score  region  by  every  test  ordering  except  mono  first. 

The  next  step  in  the  analysis  of  the  clutter  rejection  was  to  define  a 
function  that  has  better  characteristics  over  the  observer  score 
distribution  than  just  the  ratio  of  correct-target  detections  divided 
by  the  total  number  of  target  detections.  The  problem  is  that  the 
targets  do  not  all  have  the  same  difficulty  in  detection  associated 
with  them  as  seen  in  appendix  tables  A-4  and  A-5.  If  the  average 
number  of  target  detections  is  5  with  a  CR  of  one-third,  an 
observer  that  detects  8  targets  with  16  false  alarms  has  done  a 
much  better  job  of  clutter  rejection  than  an  observer  who  has  only 
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detected  2  targets  with  4  false  targets.  However,  both  of  their  CR 
values  would  have  been  one-third.  An  initial  detection  and 
clutter-rejection  measure  (D/CRM)  was  defined  that  incorporates 
all  of  the  above  issues.  The  basic  idea  is  to  perform  a  vector 
addition  of  the  score,  which  is  highly  correlated  to  the  correct 
target  detections,  and  clutter  rejection  efficiency  values 
normalized  about  the  mean  and  standard  deviation  of  the  mono- 
first  distribution.  Hence,  if  the  score  (-63)  were  higher  (better) 
than  the  mean  mono-first  distribution  by  one  mono  first  standard 
deviation  (±  19)  of  the  mono  first  score  distribution  (-82  ±  19),  then 
the  score  portion  of  the  D/CRM  would  be  1.  If  the  clutter 
rejection  efficiency  (0.41)  were  higher  (better)  than  the  mean 
mono-first  distribution  by  one  mono  first  standard  deviation 
(±  0.08)  of  the  mono  first  clutter  rejection,  efficiency  distribution 
(0.33  ±  0.08),  then  the  clutter  rejection,  efficiency  portion  of  the 
D/CRM  would  also  be  1.  These  vector  values  would  combine  to 
give  an  overall  detection  and  clutter  rejection  measure  value  of 
1,  which  is  normalized  by  the  -Ji .  If  the  score  were  -101,  then  the 
score  portion  of  the  D/CRM  would  be  -1;  and  the  D/ CRM  would 
be  0.  For  the  case  where  the  score  and  clutter-rejection  portions 
have  opposite  signs,  the  square  root  of  the  magnitude  of  the 
difference  of  the  two  vector  components  is  taken  and  the  result 
divided  by  the  V2 .  The  D/CRM  thus  defined  had  a  problem.  The 
value  of  the  clutter-rejection  efficiency  was  not  properly  boimd. 
With  possible  values  ranging  from  0  to  1,  the  clutter-rejection 
portion  of  the  D/ CRM  could  take  on  values  of  -4  to  +8. 

To  correct  this  lopsided  boimding  and  take  into  consideration  the 
issue  of  how  many  targets  were  detected  when  obtaining  the 
clutter-rejection  efficiency  value,  the  average  number  of  detected 
targets  for  the  mono  first  tests  was  used.  The  clutter-rejection 
value  portion  of  the  new  D/CRM  was  weighted  by  the  square  of 
the  ratio  of  the  number  of  correct  targets  identified,  divided  by  the 
average  number  of  targets  detected  in  the  mono  first  test.  There  is 
a  proviso  that  if  the  number  were  greater  than  one  the  weighting 
enhancement  would  not  be  applied  when  the  clutter  rejection 
efficiency  was  already  larger  than  four  standard  deviations  above 
the  mean  clutter-rejection  efficiency  (i.e.,  0.67).  When  this  was 
done,  a  well-behaved  function  was  obtained.  The  overall  mono 
first  distribution  of  D/CRM  values  had  two-thirds  within  one 
standard  deviation  of  the  average  value,  and  one-sixth  both 
one-to-two  standard  deviations  above  and  below  the  average 
value.  The  D/CRM  was  then  plotted  against  the  observer  score  to 
obtain  a  straight-line  plot  for  the  mono  first  case  and  straight  lines 
with  minor  variations  for  the  other  presentation  orderings  as 
shown  in  appendix  figures  A-41  through  A-44. 
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Now,  the  deviations  as  a  function  of  score  can  be  better  seen 
between  the  mono  first  and  the  other  presentation  orderings.  The 
portion  of  the  mono  first  observer  distribution  that  received  better 
scores  did  worse  in  rejecting  false  targets  than  any  of  the  other 
presentation  orderings  for  all  of  the  FOVs  combined.  The 
question  becomes,  then,  how  did  stereo  vision  training  impact 
stereo  vision  performance?  From  the  comments  made  by  the 
observers  and  noting  the  reactions  to  rating  the  ease  of  fusing  the 
scenes,  the  observers'  ability  to  use  stereo  vision  improved 
significantly  during  the  course  of  the  test  that  lasted  a  little  over 
an  hour.  The  stereo  training  portion  given  prior  to  the  test  lasted 
only  about  15  min.  The  large/medium  FOV  portion  of  the  test 
was  given  first  and  then  the  small  FOV  portion. 

If  the  observers  were  improving  their  use  of  stereo  vision  during 
the  testing,  there  may  be  improvement  in  the  use  of  stereo  vision 
that  shows  up  between  the  large  and  medium  FOV  portion  of  the 
test  and  the  small  FOV  portion  that  is  not  related  to  the  FOV  size 
difference.  Alternately,  the  observers  may  have  performed  better 
on  the  large  and  medium  FOV  portion  of  the  test  had  they  had 
more  stereo  vision  training.  To  see  these  effects,  the  observer 
score  versus  the  large  FOV  D/CRM  are  shown  in  appendix 
figures  A^5  through  A-48.  Here,  there  is  more  scatter  because  of 
the  smaller  sample  size,  but  the  plots  do  not  show  the  marked 
upward  migration  of  tire  distribution  seen  in  the  overall,  FOV  test 
results  for  the  mono  second,  stereo  first,  and  stereo  second  that  are 
present  in  appendix  figiues  A-42  through  A-44.  Rather  than  show 
all  of  the  FOV  plots  separately,  the  ratios  of  the  D/CRM  for  the 
other  presentation  orderings  to  that  of  the  mono  first  are  shown 
for  the  four  different  FOVs  in  appendix  figures  A-49  through 
A-52.  Here,  for  the  large  FOV  (shown  in  appendix  figure 
A-52)  some  detection  improvement  is  evident  since  the  data  forms 
a  line  whose  slope  is  larger  than  1  with  a  negative  x-axis  intercept. 
Also,  only  the  observers  with  the  poorest  clutter  rejection  (the  left 
end  of  the  distribution)  had  values  that  fell  below  a  line  of  slope 
1.0,  passing  through  the  origin  that  would  represent  no  difference 
in  the  large  FOV  results  as  a  function  of  presentation  order.  For 
tlie  medium  FOV  (shown  in  appendix  figure  A-50),  there  is  a  more 
pronounced  improvement  especially  for  the  right  side  of  the 
distribution,  and  again,  only  the  left  end  of  the  distribution 
showed  poorer  results.  The  small,  narrow  baseline  FOV  (shown 
in  appendix  figure  A-51)  gave  the  best  results;  even  the  left  end  of 
the  distribution  where  the  clutter  rejection  results  were  the  worst 
did  better.  For  the  small,  wide  baseline  FOV  (shown  in  appendix 
figure  A-52),  the  observers  did  not  perform  better  using  stereo  on 
the  left  end  where  the  scores  were  poorest.  However,  for  those 
observers  who  could  use  stereo,  the  results  were  quite  good  as 
seen  on  the  right  end  of  the  distribution. 


Finally,  because  the  small  FOV,  narrow  and  wide  baseline  was 
obtained  simultaneously  in  the  second  portion  of  the  stereo  test, 
the  combined  results  are  shown  in  appendix  figure  A-53.  Here 
there  is  much  less  variation  in  the  restdts  because  of  the  larger 
sample  size.  The  transition  from  the  poorer  results  for  the  left  end 
of  the  distribution  to  better  results  on  the  right  end  of  the 
distribution  can  be  clearly  seen  for  the  two  stereo  cases  that  track 
each  other  very  well.  The  mono  second  results  can  be  seen  to  be 
better  for  the  right  end  of  the  distribution  but  not  as  good  as  the 
stereo  vision  results.  These  results  indicate  that  about  a  factor  of 
2  reduction  in  the  false  alarms  may  be  possible  to  achieve  if 
sufficient  training  in  the  use  of  stereo  vision  and  optimized 
display  of  the  scenes  are  performed. 

The  last  issue  to  be  addressed  is  the  false  target  detections,  or  false 
alarms.  As  can  be  seen  from  the  CR  values  in  appendix  table 
A-7,  the  ratio  of  false  alarms  compared  to  correct  target  detections 
decreased  from  about  3  to  1  with  the  wide  FOV  for  the  mono 
vision  first  case  to  about  1.5  to  1  for  the  medium  FOV.  As 
mentioned  before,  it  appears  that  the  narrow  vision  FOV  global 
clutter  may  have  become  a  problem  for  the  mono  vision  first  case 
since  the  ratio  of  false  alarms  to  correct  detections  dropped  to  a 
little  over  2  to  1.  Therefore,  the  narrow  FOV  cases  will  be 
considered.  As  mentioned  above,  the  average  number  of  targets 
detected  were  fairly  close.  Using  mono  vision,  there  were  12.08  ± 
2.23  correct  target  detections  out  of  23  possible  targets;  using 
stereo  vision  there  were  12.36  ±  2.05.  The  average  numbers  of 
false  alarms  were  considerably  different  between  mono  and  stereo 
vision.  Using  mono  vision,  Ihere  were  24.64  ±11.44  false  alarms 
for  the  small  FOV  cases;  whereas,  for  stereo  vision  there  were  only 
18.72  ±  11.06.  This  represents  only  an  average  of  25  percent 
decrease  in  false  alarms,  but  the  distribution  is  quite  different  for 
those  observers  in  the  upper  half  of  the  distribution  compared  to 
the  lower  half.  Appendix  figures  A-54  and  A-55  compare  the 
numbers  of  false  alarms  between  all  36  observers  between  mono 
and  stereo  vision  for  the  combined  small  FOV  cases.  In  appendix 
figure  A-54,  one-half  of  the  distribution  that  performed  better 
(fewer  false  alarms)  represented  mono  vision  scores  of  25  or  less. 
In  this  portion  of  the  curve,  the  slope  ranges  from  about  one-half 
to  two-thirds  or  from  33-  to  50-percent  decreases  in  false  alarms. 
In  the  other  one-half  of  the  distribution,  the  slope  becomes  >  1  and 
shows  far  less  difference  in  false  alarms  between  those  observers 
who  were  not  able  to  use  stereo  vision  effectively).  For  stereo 
vision  compared  to  mono  vision,  there  is  a  significant  overall  drop 
in  the  number  of  the  false  target  detections  or  FAs  for  the  small 
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FOV  scenes  shown  in  appendix  figure  A-55.  These  results 
indicate  that  the  number  of  FAs  may  be  decreased  by  a  factor  of  2, 
if  sufficient  training  using  stereo  vision  is  given  to  the  observers 
prior  to  testing. 


Conclusions 


An  S  and  TA  test  was  conducted  that  provided  single  and  wide 
baseline  stereo  imagery  for  observer  testing.  The  database 
contains  the  same  scene  with  and  without  camouflaged  human 
targets  present.  The  analysis  of  imagery  from  the  second  of  two 
sites  has  resulted  in  several  interesting  findings.  Analysis  of 
variance  did  not  show  significant  differences  between  mono 
vision  and  stereo  vision  in  general;  however,  there  were 
differences  in  the  observer  responses.  First,  there  was  a  significant 
difference  in  the  false-target  detections  between  mono  and  stereo 
vision  for  the  narrow  FOV  cases.  Second,  analysis  of  the  effect  of 
target  range  showed  that  for  the  small,  narrow-baseline  FOV  case 
there  was  a  better  performance  with  longer  ranges.  Also,  there 
were  several  targets  that  could  be  identified  as  distribution 
outliers  and  rationale  for  the  poorer  performance  of  the  stereo 
vision  related  to  biased  displays  that  favored  the  mono  vision. 
There  was  only  a  little  difference  in  the  total  number  of  correctly 
detected  targets.  Nonetheless,  there  is  reason  to  believe  that  an 
approximate  20-percent  increase  in  correctly  identified  targets, 
using  stereo  vision,  may  be  possible  to  obtain  if  proper  training  in 
the  use  of  stereo  vision  is  given  prior  to  testing  and  optimized 
displays  are  used.  Third,  it  appears  that  as  the  FOV  was 
decreased  from  the  medixim  to  small,  the  mono  vision 
experienced  an  increase  in  false  alarms,  possibly  from  the  effects 
of  global  clutter.  About  one-half  of  the  observers  showed  a 
substantial  decrease  in  false  alarms  indicating  again,  that,  with 
proper  training  in  the  use  of  stereo  vision  and  optimized  displays, 
the  number  of  false  alarms  may  be  decreased  by  a  factor  of  2  using 
stereo  vision.  Finally,  data  were  obtained  that  can  be  used  to 
optimize  the  display  for  observer  performance  using  stereo  vision. 

Further  testing  is  required  to  obtain  results  that  show  analysis  of 
variance  significance  for  stereo  vision  over  mono  vision.  There 
were  too  many  nuisance  factors  in  the  present  observer 
experiment  that  could  now  be  eliminated  or  greatly  reduced  by 
performing  further  observer  tests  based  on  the  results  presented 
here.  The  testing  should  concentrate  on  the  small,  narrow 
baseline  FOVs  with  targets  at  ranges  of  500  m  or  more  with  either 
one  or  no  targets  present.  The  displays  from  the  two  LOS  should 
be  shown  such  that  any  differences  in  mono  vision  performance 
can  be  identified  and  compared  to  stereo  displays  that  normalize 
out  the  effects  of  observer  eye  dominance.  The  observers  should 
be  divided  into  two  groups  and  only  shown  the  mono  or  the 
stereo  test.  Finally,  the  observers  must  be  adequately  trained 
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using  stereo  vision  before  the  S  and  TA  test  is  performed.  These 
tests  could  be  performed  at  the  U.S.  Military  Academy  where 
there  is  faculty  interest  in  participating  in  the  experiments  and  a 
reasonable  observer  base  that  can  be  utilized. 
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Appendix.  Search  and  Target  Acquisition:  Single  Line  of  Sight 
Versus  Wide  Baseline  Stereo  Figures  and  Tables 


Figure  A-1.  The  four 
TNO  personnel  with 
forest  camouflage 
used  as  targets. 


Figure  A-2.  The 
three  stereo 
camera  setup  with 
10-m  baseline 
separation  used  at 


Figure  A-3.  Whole 
scene  from  site  1 
with  no  targetS- 


Figure  A-4.  Site  1  with 
target  positions 
designated  with  large 
white  cards. 
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Figure  A-5.  Whole 
scene  from  site  2  with 
no  targets. 


Figure  A-6.  Site  2  with 
target  positions 
designated  with  large 
white  cards. 
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Table  A-l.  The  7x4  array  of  target  sectors 


A1 

A2 

A3 

A4 

A5 

A6 

A7 

B1 

B2 

B3 

B4 

B5 

B6 

B7 

Sectors 

Cl 

C2 

C3 

C4 

C5 

C6 

C7 

D1 

D2 

D3 

D4 

D5 

D6 

D7 

NOTE: 


Used  for  site  1  with  nominal  ranges  to  the  ground  level  within  the 
sectors  of  130  m  to  180  m  in  row  D,  180  m  to  340  m  in  row  C,  340  m  to 
520  m  in  row  B,  and  520  m  to  675  m  in  row  A. 


Table  A-2.  FOVs  used  for  the  site  2  test 


FOV  size 

Number  of 
grid  blocks 

Sectors  used  for  targets 

Small  (50%  smaller) 
192  X  120  pixels 

8  wide  by 

5  high 

20  with  23  targets 

Group  A:  10  with  12  targets 
Group  B:  10  with  11  targets 

Medium  (standard 
size)  384  X  240 
pixels 

16  wide  by 

10  high 

12  with  23  targets 

Group  1:  6  with  12  targets 
Group  2:  6  with  11  targets 

Large  (50%  larger) 
576  X  360  pixels 

24  wide  by 

15  high 

10  with  22  targets 

Group  1:  5  with  10  targets 
Group  2:  5  with  12  targets 

NOTE: 


The  small  FOV  had  Groups  A  or  B  displayed  with  the  10-m  baseline  and 
the  other  group  with  the  20-m  baseline.  The  medium  and  large  FOVs 
were  combined  for  one  portion  of  the  test  with  either  Groups  No.  1  or  2 
that  contained  either  22  or  23  different  targets. 


Figure  A-7.  Large 
FOV  scene  with  three 
targets  present. 
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NOTE: 

The  most  conspicuous  target  is  crouching  just  to  the  left  of  the  concrete 
bunker  in  the  bottom  of  the  scene.  The  other  two  targets  are  in  the  center 
and  upper  right. 


Figure  A-8.  Medium 
FOV  scene  with  five 
targets  present. 
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NOTE: 

There  is  only  one  obvious  standing  target  in  the  center  right. 


Figure  A-9.  Medium- 
FOV  scene  with  two 
squatting  targets 
present. 


Figure  A-10.  Small 
FOV  scene  with  one 
large  squatting  target 
present. 


NOTE: 

There  is  one  easy  target  in  the  lower  left  portion  of  the  scene  and  a  very 
difficult  target  just  on  the  far  side  of  the  road  between  top  left  and  top 
center. 


NOTE: 

Small  FOV  scene  with  a  large  target  squatting  in  the  center  right  and  a  very 
similar  false  target  in  the  lower  left. 
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Table  A-3.  Distribution  of  correct  target  detections  by  the  observers  for 
a  particular  target  for  site  1 _ 

_ Problems  with  first  site _ 

Number  of  correct 
Number  of  targets  detections  out  of  30 

19  27-30 

1  16-26 

3  12-15 

1  0-  3 


NOTE: 

The  targets  were  too  easy  to  detect  and  the  stereo  display  did  not  work 
at  ranges  of  less  than  300  m. 


Figure  A-11.  One  of 
the  moderately 
difficult  targets  in  the 
squatting  position  at 
550  m  in  the  lower 
center  of  the  scene 
under  clear  sky 
conditions  from  site  1. 
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Figure  A-12,  Another 
moderately  difficult  target 
from  site  1  is  shown  squatting 
in  the  lower  center  of  the 
scene  at  375  m. 


Figure  A-13.  An  easy 
standing  target  is  shown 
in  the  center  left  at  450  m 
and  a  second  moderately 
difficult  target  to  the  left 
of  the  big  bush  in  the  top 
left  at  475  m. 
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Figure  A-14.  An  easy 
standing  target  is  shown 
in  the  center  left  at  450  m 
and  a  second  moderately 
difficult  target  to  the  left 
of  the  big  bush  in  the  top 
left  at  475  m. 


Figure  A-15.  This  is 
a  panoramic  view  of 
the  entire  first  site 
with  no  targets 
present  with  the 
near  road  at  100  m 
and  the  far  road  at 
650  m. 


Figure  A-16. 
Example  of  how 
scenes  must  be 
merged  differently 
top  to  bottom  when 
observer  views  the 
objects  A,  B,  and  C  in 
stereo. 
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NOTE: 

The  observers  have  different  parallax  between  the  left  and  right  images. 

Table  A-4,  Difficulty  in  correctly  detecting  targets  in  site  2  scenes  using 
mono  vision 


Detection  difficulty  mono  LOS 


Efficiency 

LM 

MM 

SNM 

SWM 

0.86-1.00 

1 

3 

5 

6 

0.70-0.85 

2 

4 

E4M 

E5M 

0.53-0.69 

0 

E2M 

3 

1 

0.36  -  0.52 

3 

2 

3 

1 

0.20  -  0.35 

E3M 

2 

M2H 

M3 

0.00-0.19 

M13H 

MlOH 

6 

7H 

NOTES: 

1.  Abbreviations: 

E  -  Easy 
M  -  Medium 
Hard 

2.  Acronyms: 

LM  -  Large  Mono 

MM  -  Medium  Mono 

SNM  “  Small  Narrow  Baseline  Mono 

SWM  -  Small  Wide  Baseline  Mono 

3.  The  results  are  as  a  function  of  FOV  and  correct  target  ID  efficiency 
(fraction  of  the  observers  making  correct  target  ID).  The  entries  are  the 
correct  target  IDs.  Here  there  is  no  difference  between  the  small,  narrow 
mono  SNM  and  small,  wide  mono  SWM  scenes  displayed. 
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Table  A-5.  Difficulty  in  correctly  detecting  the  targets  in  site  2  scenes  using 
stereo  vision 


Detecrion  difficulty  stereo  LOS 

Efficiency 

LS 

MS 

SNS 

SWS 

0.86-1.00 

1 

3 

E8 

6 

0.70-0.85 

2 

E5 

3M 

E5M 

0.53-0.69 

2 

0 

1 

3 

0.36-0.52 

E2 

5M 

2 

1 

0.20-0.35 

2M 

2 

M4H 

M4H 

0.00-0.19 

M13H 

M8H 

5 

6 

NOTES: 

1.  Abbreviations: 

E  “  Easy 

M  -  Medium 
H-Hard 

2.  Acronyms: 

LS  -  Large  Stereo 

MS  -  Medium  Stereo 

SNS  “  Small  Narrow  Baseline  Stereo 

SWS  -  Small  Wide  Baseline  Stereo 

3.  The  results  are  as  a  function  of  FOV  and  correct  target-ID  efficiency. 

Table  A-6.  Observer  task  results  showing  the  scoring  results  for  the 
different  testing  order  first  (f)  and  second  (s)  for  the  combined  FOV  and 
narrow  baseline^  small  FOV 


S  and  TA  results 


Mono  (f) 

Stereo  (f)  Mono  (s) 

Stereo  (s) 

All  FOVs 

45  or  46  targets 

T  =  19.2 

T  =  20.8  T  =  20.2 

T  =  19.2 

CR  =  .33 

CR  =  .41  CR  =  .39 

CR  =  .45 

S  =-82 

S  =  -68  S  =  -73 

S  =-68 

Small/narrow  FOV 

11.5  targets 

T  =  5.6 

T  =  6.6  T  =  6.5 

T=  6.6 

CR  =  .34 

CR  =  .44  CR  =  .37 

CR  =  .48 

S  =-20 

S  =  -12  S  =-14 

S  =-10 

NOTE: 

T  -  Targets 

CR  “  Clutter  Rejection 

S  -  Score 
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Table  A-7.  Observer  task  results  show  scoring  results  for  the  different 
testing  order,  first  (f)  and  second  (s)  for  the  combined  FOV  and  narrow 
baseline,  small  FOV  _ 


Mono  (f) 

S  and  TA  results 

Stereo  (f)  Mono  (s) 

Stereo  (s) 

T=  2.9 

Large  FOV 

T  =  3.2  T  =  2.4 

T  =  2.5 

CR  =  .28 

CO 

11 

u 

CR  =  .34 

CR=  1 

S  =  -27 

S=  -25 

S  =-27 

S  =  -26 

H 

11 

Medium  FOV 

T  =  5.0  T  =  5.0 

T  =  4.6 

CR  =  .38 

CR  =  .44 

CR  =  .47 

CR=  .49 

S  =  -19 

S=  -16 

S  =-16 

S  =  -17 

T  =  5.6 

Small/narrow  FOV 

T  =  6.6  T  =  6.5 

T  =  6.6 

CR  =  .34 

II 

u 

CR  =  .37 

CO 

pi 

u 

S  =  -20 

S=  -12 

S  =  -14 

S  =  -10 

T  =  5.8 

Small/wide  FOV 

T  =  6.0  T  =  6.3 

T  =  5.5 

CR  =  .36 

CR  =  .42 

CR  =  .37 

n 

11 

00 

S  =  -17 

S=  -15 

S  =-16 

S  =  -15 

NOTE: 

T  -  Targets 

CR  -  Clutter  Rejection 

S  -  Score 


Figure  A-17.  Plot  of  correct 
target  detections  (stereo/ 
mono)  as  a  function  of  target 
range  for  the  large  FOV. 
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Figure  A-18.  Plot  of 
correct  target  detections 
stereo /mono  as  a 
function  of  target  range 
for  medium  FOV. 


Figure  A-19.  Plot  of 
correct  target 
detections  stereo/ 
mono  as  a  function  of 
target  range  for  the 
narrow  baseline,  small 


Figure  A-20.  Plot  of 
eorreet  target  deteetions 
(stereo /mono)  as  a 
function  of  target  range 
for  the  wide  baseline, 
small  FOV. 
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Figure  A-21.  Left  (on  left)  and  center  (on  right)  large  FOV  of  the  scene  containing  target  6B  (top  right  to  top  center 
of  left  view). 
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Figure  A-24.  Left  (on  left)  and  right  (on  right)  small,  wide  baseline  FOV  of  the  scene  containing  target  4C  (center  to 
top  center  in  standing  in  front  of  a  bush). 
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Figure  A-25.  Left  (on  left)  and  right  (on  right)  small,  wide  baseline  FOV  of  the  scene  containing  target  613  (top  left 
in  both). 


Figure  A-26.  Left  (on  left)  and  right  (on  right)  small,  wide  baseline  FOV  of  the  scene  containing  target  5D  (top  left 
of  the  left  view). 


Figure  A-27.  Left  (on  left)  and  right  (on  right)  small,  wide  baseline  FOV  of  the  scene  containing  target  6D  (top 
right  of  U)th  views). 
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Figure  A-32. 
Comparison  of  mono 
versus  stereo  vision 
for  correct  target  ID 
rate  for  the  large 
FOV  scenes. 


Figure  A-33. 
Comparison  of 
mono  versus  stereo 
vision  for  correct 
target  ID  rate  for  the 
medium  FOV 


scenes. 


Figure  A-34. 
Comparison  of  mono 
versus  stereo  vision 
for  correct  target  ID 
rate  for  the  small 
FOV  scenes  with 
narrow  stereo 
baseline. 


Figure  A-35. 
Comparison  of  mono 
versus  stereo  vision 
for  correct  target  ID 
rate  for  the  small 
FOV  scenes  with 
wide  stereo  baseline. 
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Figure  A-42.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  the  mono 
vision  first  test. 


Score  vs  Detection/Clutter  Rejection  Measure  (Mono  Second) 


3 

if) 

TO 

d) 


C 

o 


0) 

cc 

(D 

ti 

O 

o 


(D 

Q 


♦  Mono  Second 


Figure  A-43.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  the  mono 
vision  first  test. 
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Figure  A-44.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  the  mono 
vision  first  test. 


Figure  a-45.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  large  FOV 
mono  first  test. 


Figure  A-46.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  large  FOV 
mono  second  test. 


Figure  A-47.  Plot  of 
observer  score  versus 
detection/clutter  rejection 
measure  for  large  FOV 
stereo  first  test. 


Figure  A-48.  Plot  of 
observer  score  versus 
detection/ clutter  rejection 
measure  for  large  FOV 
stereo  second  test. 
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Figure  A-49.  Plot  of  ratio 
detection/clutter  rejection 
measure  for  other  orderings 
to  that  of  mono  first  test  for 
large  FOV. 
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Figure  A-50.  Plot  of  ratio 
detection/ clutter  rejection 
measure  for  other 
orderings  to  that  of  mono 
first  test  for  large  FOV. 
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Figure  A-51.  Plot  of  ratio 
detection/clutter  rejection 
measure  for  other  orderings 
to  that  of  mono  first  test  for 
large  FOV. 


Small  Narrow  FOV  Mono  First  vs  Other  Presentation  Orderings 
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Figure  A“52.  Plot  of  ratio 
detection/clutter  rejection 
measure  for  other  orderings 
to  that  of  mono  first  test  for 
large  FOV. 
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Figure  A-SS.  Plot  of  ratio 
detection/clutter  rejection 
measure  for  other  orderings 
to  that  of  mono  first  test  for 
large  FOV. 


Small  FOV  Mono  First  vs  Other  Presentation  Orderings 
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False  Alarms  for  Small  FOVs 


Figure  A-54.  Plot  of 
mono  vision  versus 
stereo  vision  false 
alarms  for  the  small 
FOV  cases. 


Figure  A-55. 
Comparison  of  the  false 
target  detections  or 
false  alarms  for  mono 
vision  versus  stereo 
vision  for  the  small 
FOV  scenes  for  both 
narrow  and  wide  stereo 
baselines. 
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Small  FOV  False  Alarms  Mono  versus  Stereo  Vision 
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